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Directed networks are ubiquitous, from food webs to the World Wide Web, but 
the directionality of their interactions has been disregarded in most studies of 
global network structure. One important global property is the tendency of 
nodes with similar numbers of edges to be connected. This tendency, called 
assortativity, affects crucial structural and dynamic properties of real-world 
networks. Here we demonstrate the importance of edge direction by studying 
assortativity in directed networks. We define a set of four directed assorta- 
tivity measures. By comparison to randomized networks, we discover signifi- 
cant features of three network classes: online/social networks, food webs, and 
word-adjacency networks. The full set of measures is needed to reveal patterns 
common to the class or to separate networks that have been previously classi- 
fied together. Our measures expose limitations of existing theoretical models, 
and show that many networks are not purely assortative or disassortative but 
a mixture of the two. 

Complex systems — characterized by diverse, strongly-interacting components — can often 
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be represented as networks [[T]|. In a network, nodes represent components of the system and 
edges between nodes represent interactions between components E |3l IH. Networks from 
diverse fields share global, whole-network properties, including a broad distribution of degrees 
(number of edges attached to a node) [3J, short average distance between nodes [HI, high error 
tolerance 0, and a modular structure [7J. These common properties suggest that complex 
networks share universal organizational principles E |3]]. It is of equal interest to discover 
properties in which networks dijfer. These properties can identify the sources of the structural 
and dynamic diversity of networks, and can be used to classify networks on the basis of shared 
architecture. Such properties can be local (e.g. motifs — local connection patterns appearing 
more frequently in the real- world network than in randomized ensembles [[8l[9l) or global (e.g. 
assortativity — the tendency of nodes to connect to nodes with a similar number of edges BlfTOl 

Assortativity affects important structural and dynamic properties of networks. In an assor- 
tative network, high degree nodes tend to connect to other high degree nodes; hence assortative 
networks remain connected despite node removal and failure [iT], but are hard to immunize 
against the spread of epidemics llT2l . In a disassortative network, conversely, high degree nodes 
tend to connect to low degree nodes [[TOl [HI; these networks limit the effects of node failure 
because important, high degree nodes are unlikely to be connected to each other [fT3ll . Assorta- 
tivity is measured by the Pearson correlation (r) of node degrees at either end of each edge in 
the network [[l0l[II]]. This quantity ranges from —1 to 1, with (r > 0) in an assortative network 
and (r < 0) in a disassortative network. Earlier work suggested a simple classification on the 
basis of assortativity, in which social networks are assortative and biological and technological 
networks are disassortative [|4l[T0l[II]|; but see lfT4ll . 

In many complex systems, however, interactions are directional. In directed networks, an 
edge from source to target (A B) indicates, for example, that organism A is eaten by organ- 
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ism B. Although edge direction is essential to classifications on the basis of local structure — i.e. 
motifs [[8l [9l — the study of global network properties has largely disregarded edge direction. 
In particular, assortativity in directed networks is studied by ignoring edge direction ifTOll or by 
measuring one ifTDl or two lITSll out of four possible degree-degree correlations; see also [16]. 
Here we show that assortativity becomes a powerful tool for characterizing directed networks 
only when we consider the directionality and nature of the interactions being represented. The 
pattern across all four correlation measures reveals common structural features in classes of 
directed networks, and distinguishes between networks grouped together on other criteria 191. 
We also show the limitations of existing theoretical models of some types of directed network. 

Directed Assortativity 

Nodes in directed networks have both an in-degree (number of incoming edges) and an out- 
degree (number of outgoing edges). Hence we introduce a set of directed assortativity measures 
to capture this feature. Figure 1 illustrates the four possible degree-degree correlations, with 
examples typical of assortative or disassortative networks. Let a,/? G {in, out} index the 
degree type, and jf and fcf be the a- and /3-degree of the source node and target node of edge i. 
Then a set of assortativity measures can be defined using the Pearson correlation: 



where E is the number of edges in the network, = E""^ Zli if , cr"^ = y ~ J'^Y^ 

and are similarly defined, and the arrow indicates that an edge runs from the node with 
the a-indexed degree to the node with the /^-indexed degree (Methods). 

We compare the degree-degree correlations in a real- world network to an ensemble of ran- 
domized networks with the same in- and out-degree sequence (number of nodes n{k^^, fc^^*) 
with in-degree k^^ and out-degree fc^^*; hereafter degree sequence) [[HI [9l dH [TTll (Methods). 
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This comparison distinguishes features that are typical of networks with a given degree se- 
quence from those that may reflect other organizational or structural principles of the real- world 
network. The comparison assigns each correlation f(a, /?) a statistical significance (Z-score): 



Z (Q^, p) = 7^ 7 — (2) 

which quantifies the difference between the assortativity measure of the real-world network 
frw(<^,/3) and the average assortativity measure in the randomized ensemble (rrand(<^, /?)) in 
units of the standard deviation of the latter cr(frand(<^, To account for the fact that larger 
networks typically have larger Z-scores, we normalize the individual Z-scores ^ to define an 
Assortativity Significance Profile (ASP), where ASP (a, /?) = Z(a, Z(a, ^ 
positive ASP(a, /?) ("Z-assortative") indicates that the real-world network is more assortative 
in that measure than is typical for networks with its degree sequence; a negative ASP(a, /?) 
("Z-disassortative"), less assortative than is typical. 

By assigning statistical significance to each directed assortativity measure, we can identify 
potential functional features of real- world networks [|3I3- Our analysis includes social, tech- 
nological, biological and lexical networks; Supplementary Table 1 provides full descriptions 
and sources for all networks analysed. We find that many directed networks are not simply 
assortative or disassortative; rather, they can be assortative in some measures and disassortative 
in others. Supplementary Table 2 collects the full results and all error estimates. 

Structural features of classes of directed networks 

We first consider online and social networks. Online networks are built collaboratively and 
share motif patterns with social networks, leading them to be grouped in the same "superfam- 
ily" [IH. In an online network, edges represent hyperlinks; in the social networks considered 
here, edges represent positive sentiment. Figure 2a shows the ASP of the World Wide Web 
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and two social networks studied in Each network differs significantly in its ASP, showing 
that our ASP measure discriminates between networks with similar motif structure. Figure 2b 
shows the ASP of the WWW, Wikipedia, and a network of political blogs. All three networks 
are {out, in) Z-disassortative, indicating that the small disassortative effects measured previ- 
ously [Illlll9l represent substantial deviations from typical behaviour and thus reflect important 
growth mechanisms or functional constraints; note that comparison to randomized ensembles 
is essential to reveal this fact. The WWW and Wikipedia are also (m, out) Z-assortative; this 
property has not been measured before, and indicates that pages with high in-degree (corre- 
sponding to "authorities" [20]) link to pages with high out-degree ("hubs" EOll ) more frequently 
than expected from the degree sequence. All three online networks show no assortative or dis- 
assortative tendency in the {out, out) or (m, in) measures, consistent with previous work on the 
average neighbor in-degree in Wikipedia EDl . 

Models of online network growth should reproduce the qualitative features of each online 
ASP. We tested a directed preferential attachment model for the WWW (Methods) [22J. Figure 
Ic shows that this model fails in three independent realizations to generate any of the ASP 
characteristics of the WWW. As shown in Figure 2d, f (m, out) is extremely small in the growth 
model, in contrast to the large assortative f(m, out) = 0.2567 of the WWW. 

The three online ASPs cannot be explained by the degree sequence or simple models of 
network growth, and hence indicate other structural or functional factors at play. {Out, in) 
Z-disassortativity may reflect that hyperlinking and (more generally) information have a hier- 
archical structure, e.g. the existence of distinct "high-level" topics — much as disassortativity 
in protein interaction networks captures the existence of weakly connected modules lfT3ll . The 
{in, out) assortativity and Z-assortativity of the WWW are especially pertinent for how users 
"flow" through the Web. High in-degree nodes (authorities) may "win" their status because 
they aggregate links to useful pages, combining with useful pages to become high in-/high out- 
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degree "superhubs" that provide access to large parts of the network while structuring the search 
process. 

We now turn to food webs [|23]| . Recall that a directed edge from species A to species B 
means that A is eaten by B. Food webs from diverse ecosystems display universal properties, 
e.g. a common form for the in- and out-degree distributions [|24l |22|. Previous work indi- 
cated that food webs are disassortative in the {out, in) measure ifTTIl . As shown in Figure 3a, 
although f{out, in) is disassortative for all food webs, we see a wide range of values from Z- 
disassortative to Z-assortative in the {out, in) ASP measure of Figure 3b. Thus, once the degree 
sequence is taken into account, no common pattern remains. 

In contrast, all of the food webs are both disassortative and Z-disassortative in the (m, out) 
measure, meaning that organisms with a large number of prey species are eaten by organisms 
with a small number of predator species (and vice versa) more frequently than expected based 
on the degree sequence. This tendency captures the structuring of ecosystems into trophic 
levels [23 J, and is consistent with an overall "spindle" shape to the food web (fewer species in 
the upper and lower levels and a greater number in the middle) |[26l . The small lower trophic 
levels follow from the general practice of aggregating the lowest units of the food web into 
broad categories like "plant", "detritus", etc. The consumers of these lowest units have very 
low in-degrees and are in turn consumed by predators of low trophic level (which have high 
out-degrees). The food webs are also assortative and Z-assortative in both the {out, out) and 
{in, in) measures. 

To identify the origin of these patterns, we built two theoretical models for each web (Meth- 
ods). Both models reproduce the number of species exactly and the number of edges to within 
5%. The "cascade" model assigns each species a random "niche" value and randomly allows 
species to eat species of lower value [27J. The "niche" model permits cannibalism and eating 
of species with higher niche value BTIl . Figures 3c and 3d show the f{a, /?) and ASP (a, /?) for 
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the cascade and niche models of a particular food web (St. Marks). The model webs shown 
are typical of the model. While typical cascade and niche model webs qualitatively reproduce 
the pattern observed in Figure 3a, the ensemble of niche model realizations for a given food 
web displays large variance (Methods). The large variance in the niche model ensemble favors 
the cascade model and suggests that ordering species along a single niche dimension largely 
explains the observed patterns in r{a,(3) and ASP (a,/?) for real- world food webs. Neither 
model, however, typically generates the {out, in) Z-assortativity of certain food webs. 

Finally, we analyse word-adjacency networks, in which edges point from each word to any 
word that immediately follows it at any point in a selected book [9J. For example, {for 
example). The four book networks are strongly disassortative across f(a,/3); see Figure 4a. 
Figure 4b shows that they are also similarly disassortative in their ASP. 

The in- and out-degree of nodes in these networks are both increasing functions of word 
frequency [28J; thus the correlation between the in- and out-degrees of a node is high (rauto > 
0.86). Very high frequency words generally have grammatical function but low "semantic con- 
tent" ll29l . While the large rauto guarantees that the values for all four measures will be similar, 
disassortativity across all measures could result from two possible mechanisms. 

Milo et al. propose a bipartite model (Methods), with a few high frequency grammatical 
words and many low frequency content words; grammatical words must be followed by content 
words, and vice versa [9J. This model generates excessive negative values across all r{a,(3), 
as shown in Figure 4a (Bipartite). When compared to the appropriate randomized ensemble, 
however, it reproduces the roughly equal, negative ASP(a, (3) of the real-world networks; see 
Figure 4b. Alternately, the observed disassortativity could result from the broad word-frequency 
distribution (Zipf 's law |[28]| ). We scrambled the English text (Scrambled) to produce a text with 
identical word-frequency distribution but no grammatical structure (Methods). This scrambled 
text has r{a, (3) very close to the empirical values, as shown in Figure 4a; but it is Z-assortative 
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across all measures, Figure 4b, unlike the real- world networks. In addition, neither model yields 
the relative magnitude of ASP{out, in) and ASP(m, out), indicating that this difference results 
from genuine linguistic structure. 

Conclusion 

Taken together, our results demonstrate the importance of edge direction and the value of assor- 
tativity in the analysis of directed networks. Many directed networks are not purely assortative 
or disassortative, but a mixture of the two. By comparison with randomized ensembles, we 
are able to detect novel and statistically significant features like (m, out) assortativity (or "su- 
perhubs") in the WWW. Our measures identify common features of classes of networks (see 
Supplementary Figures 1 and 2), and can be usefully compared to a local analogue, the Triad 
Significance Profile (TSP), which measures the significance of three node motifs [[9]]. The mea- 
sures f{a, P) and ASP(a, /?) are more computationally tractable and scalable, requiring only the 
list of edges in the network; they also discriminate between networks grouped together by TSP 
(online/social), while confirming the classification of word-adjacency networks [|9]|. We were 
able to test theoretical models for all three network classes, rejecting the preferential attach- 
ment model of WWW growth and both bipartite and scrambled text models of word adjacency 
networks. Our measures reveal possible functional features, and their straightforward inter- 
pretation leads to simple questions: for example, do the connections between authorities and 
hubs in the WWW revealed by positive f(m, out) reflect the demands of network navigation, 
spreading user flows across the network, whereas the negative f(m, out) in food webs reflects 
the opposite tendency to concentrate energy flows at higher trophic levels? Such questions sug- 
gest wide application of our techniques in investigating the structure and function of directed 
networks. 
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Methods 

Technical Issues in Assortativity Measures 

Strictly Newman defines r in terms of the excess degree, i.e. the degree of the node minus 1 for 
the edge under consideration. However, as he observes in HlOll the correlation coefficients are 
exactly the same if the degree is used. Identical Z-score results are obtained for any assortativity 
measure that is related to the Pearson coefficient r(a, /?) by a linear transformation, e.g. the 5- 
metric of Alderson and Li [l30l : thus when statistical significance is properly measured, it is 
sufficient to use the familiar Pearson coefficient. 

Constructing the Ensemble of Randomized Networks 

In order to identify features of real-world networks having functional significance, it is nec- 
essary to compare the network to a null model capturing certain basic structural features of 
the real- world network. The simplest null model assumes that the single-node properties of the 
network are of primary importance, and compares the real- world network to an ensemble of ran- 
domized networks with the same fixed degree sequence (hereafter FDS ensemble); see ifSlfTTl 
for a detailed justification. We sample from this ensemble using the Monte Carlo rewiring al- 
gorithm described below. The rewiring algorithm starts with a directed network with a given 
in- and out-degree sequence n{k^^, fc^^*) and, by randomly swapping links between nodes many 
times, samples from the ensemble of networks sharing that same degree sequence. 

Each rewiring step proceeds as follows. Two directed edges i and j are chosen from the 
network at random. Each of these edges points from a "source" node to a "target" node. The 
algorithm proposes two candidate edges in which the source node from edge i points to the 
target node of the edge j and the source node from edge j points to the target node from edge i. 
If either candidate edge is already present in the network, no rewiring is performed in this step. 
This ensures that no multiple connections are induced by rewiring. If both candidate edges are 
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not already present, then the randomly selected edges i and j are removed from the network 
and replaced by the candidate edges. By performing many such rewiring steps, the edges in 
the network are randomized, but the in- and out-degree sequence of the network is maintained. 
Note that we do not fix the number of two-way edges, as in some approaches 

Self-edges (edges pointing from a node to itself) can be either allowed or disallowed. If 
the real- world or model network contains self-edges, we allow them in the sampled networks; 
otherwise we do not allow self-edges. If self-edges are not allowed, any rewiring step which 
produces a self-edge is rejected. In practice, we found that the presence or absence of self-edges 
in the sampled networks produces no significant change in the network properties measured. 

To produce a randomly sampled member of the FDS ensemble, we performed 10^ edge 
swaps on the network between samples. In two cases, this number of swaps was not enough to 
fully randomize the networks between samples. For the World Wide Web and related models, 
10^ edge swaps were performed between each sample. For the Wikipedia network 10^ edge 
swaps were performed between each sample. 

Note that in most cases the real- world or model network is not a typical member of its FDS 
ensemble. Thus, when we begin our sampling of the FDS ensemble, we perform ten times 
the inter- sample number of rewiring steps on the original network to ensure that we are truly 
sampling typical FDS ensemble networks. 

We estimate errors in the average values for these ensembles by assuming that the errors 
are normally distributed and that after i samples the difference between the mean value of an 
observable up to that point {A)i = Y^]=i the final mean (^4) is less than b in 

absolute value, for some constant h. Plotting the difference as a function of i"^/^ and choosing h 
to contain approximately 90% of the data points gives an estimate of the error in the final mean. 
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World Wide Web Growth Model 

The growth model for the World Wide Web is taken from [l22l : we summarize it here for com- 
pleteness, retaining the original notation. This model constructs a directed network approx- 
imating the power-law in-degree and out-degree distributions of a target real-world network, 
n{k''^) oc (/c'"')"'"^^ and oc The model is parameterized by the number of 

nodes in the network, A^; the average out-degree (fc^^*) (equal to the average in-degree); and 
the exponents of the in- and out-degree distributions, z^in and z^out- At every step of the growth 
model, two events are possible. With probability p a new node is born and attaches to an exist- 
ing node in the network with a directed edge going from the new node to the existing node; the 
target node is chosen with probability depending on its in-degree i. With probability q = 1 — p 
a directed edge appears between two existing nodes, with the source and target nodes selected 
with probabilities depending on the out-degree of the source and in-degree of the target. The 
growth model will produce a network with desired (fc^^*) when l/p = {k^"^^). The probability 
of attachment in the first process for a target node of in-degree i is proportional io Ai = i + X; 
the probability in the second process of an edge between a source node with out-degree j and 
a target node with in-degree i is proportional to C{j, i) = {i + X){j + /x). The parameters A, /x 
can be chosen such that the target exponents are approximated; the conditions are = 2 + pX 
and z^out = 1 + + jipq'^. We initialize the model with two unconnected nodes and run until 
the network has nodes. Generically this will produce multiple edges with the same source 
and target nodes. We eliminate these to yield a simple graph; this does not substantially alter 
the degree distributions. Note from Supplementary Table 1 that the number of edges E for the 
model networks is quite close to the real-world value. We report here the exponents. For the 
World Wide Web data set we estimate z^in = 2.32 and z^out = 2.66. For the three model webs, 
the exponents are indistinguishable and are z/(j^ = 2.2 ± 0.2 and z/^^^ = 2.5 ± 0.2. Note that the 
networks generated by this model are Z-assortative across all four measures. 
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Cascade and Niche Models 

The cascade and niche models of food webs are taken from ll27l : we summarize them here, 
retaining the original notation. Both models are parameterized by the number of species in the 
target real- world food web, A^, and the connectance C = E/N^, where E is the number of 
edges in the food web. In the cascade model, every species is assigned a random "niche" value 
chosen from the uniform distribution on [0, 1]. With probability P = 2CN/{N — 1) a given 
species will consume a species with lower niche value; i.e. we add a directed edge from the 
latter to the former. This generates model food webs having on average the same number of 
edges E as the target food web. 

In the niche model, every species i is assigned a random niche value rii from the uniform 
distribution on [0, 1] as before. To permit cannibalism and the eating of species with higher 
niche value, each species consumes every species falling within some range r^. The center of 
the range q is chosen uniformly from [0.5r^, n^]. The range is chosen such that the expected 
connectance is that of the real-world web. This can be guaranteed by drawing the random 
variable from a beta distribution /(r^|l, /?) = — r^)^"^, < < 1 with expected value 
E{ri) = 1/(1 + /?) = 2C. Thus letting /? = (1 - 2C)/{2C) yields the connectance of the real- 
world food web, on average. The species of smallest niche value is assigned to be the "basal 
species" [l27l . We do not check for disconnected or trophically identical species, as these are 
quite rare. 

For each real-world food web, we generated 500 cascade model and 500 niche model net- 
works. All 500 networks had E within 5% of the real-world food web; model realizations not 
meeting this criterion were rejected from the ensemble. To identify typical networks (shown 
in the paper and described in Supplementary Tables 1 and 2) we selected the model network 
with the smallest distance to the average values of f(a,/3), considered as points in R^. We 
also measured the standard deviations in each ensemble; these are displayed in Supplementary 
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Table 3. Note that the standard deviations for the niche model are generally quite large. Note 
also that unlike any of the other theoretical models considered, both the cascade and the niche 
models generate networks with a mixture of assortative and disassortative (and Z-assortative 
and Z-disassortative) measures. 

Bipartite and Scrambled Text Models 

The Bipartite model is taken from (H. This model assumes that there are two categories of 
words: a few high frequency grammatical words and many low frequency content words. Words 
of the first type must alternate with words of the second type. The resultant word-adjacency 
network will necessarily be bipartite, with edges permitted from grammatical words to content 
words and vice versa. To build this model, we assume A^gram = 10 and A^cont = 1000. We go 
through all possible pairs of grammatical words and content words and draw a random number 
X. If X < p = .06 we put an edge from the grammatical word to the content word; ifp<x<2p 
we put an edge from the content word to the grammatical word; and if 2p < x < 2p + q for 
q = .003 we put an edge going each way. The values of p, q are taken from 

We constructed the Scrambled Text Model by taking the underlying text for one of the 
word- adjacency networks (English; On the Origin of Species by Charles Darwin) and randomly 
scrambling the order of the words. The scrambling destroys any syntactic structure, although 
some grammatical features remain — namely, the high frequency of articles, prepositions, etc. 
The assortativity across all ASP (a, (3) of networks generated from the scrambled text is subtle 
but understandable. The high correlation between the in- and out-degrees of a node guarantees 
that all values will be similar. Because high frequency words in the text are so common, they 
will occasionally follow one another; this means the Scrambled Text word-adjacency network 
will have some links between nodes with high in- and out-degrees. But since multiple links 
are disallowed, rewiring will, on average, destroy these links between high degree nodes, thus 
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making the ensemble less assortative than the Scrambled Text word-adjacency network, and all 
ASP(a,/3) assortative. 
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Figure 1: The four degree-degree correlations in directed networks. In each case the fuzzy 
edges indicate that nodes can have any number of edges of this type, as it does not enter into 
the specific correlation. For each correlation we show an example typical of assortative or 
disassortative networks. 
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Figure 2: Online networks differ from social networks and growth models, a, We plot the Assor- 
tativity Significance Profile (ASP) for a subset of the World Wide Web (in which edges represent 
hyperlinks) and two social networks (students in a leadership class and prisoners, edges repre- 
sent positive sentiment). The three networks differ substantially, despite having similar motif 
patterns [9J. b. We show the ASP for the WWW, a snapshot of Wikipedia (edges represent 
hyperlinks), and a collection of political blogs (edges represent hyperlinks). All three online 
networks are more {out, in) disassortative than would be expected from the degree sequence 
alone; more surprisingly, the WWW and Wikipedia are significantly (m, out) assortative. c, d. 
Three realizations of the WWW growth model [22J fail to reproduce the features seen in the 
ASP(a, /?) or r{a, (3) of the WWW. 
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Figure 3: Simple models largely explain directed assortativity patterns of food webs. In food 
webs, a directed edge from A to B indicates that A is eaten by B. a, r(a,/?) for food webs 
collected from several diverse ecosystems. Note the common pattern: disassortative in the first 
two and assortative in the second two measures, b, The Assortativity Significance Profile (ASP) 
for these food webs. Controlling for the degree distribution highlights common Z-disassortative 
and Z-assortative behaviours in the latter three measures but not in the {out, in) measure, c, d, 
The cascade and niche models are able to reproduce most common behaviours robustly. 
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Figure 4: Simple models cannot explain directed assortativity patterns of word-adjacency net- 
works. A directed edge from word X to word Y indicates that X precedes Y at some point in 
the text under consideration, a, We plot f (a, /?) for word- adjacency networks in four languages. 
The common pattern may result from grammatical structure or a broad word frequency distri- 
bution. The Bipartite model overestimates the r(Qf, /?), as shown in a, while the Scrambled text 
model produces realistic values, b, We plot the Assortativity Significance Profile (ASP) for the 
same networks. The Bipartite model produces realistic values, while the Scrambled text model 
produces assortative values. The real-world networks are remarkably similar, despite ranging 
in size over an order of magnitude. 
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Supplementary Table 1: Network properties and sources. We show the class of network, the 
number of nodes A^, the number of edges E, the average out degree (fcout). whether or not the 
network has self-edges, the Pearson correlation between the in- and out-degrees of nodes in 
the network rauto. and the source (see list below). Note that after reconstructing the adjacency 
matrix by hand from references JSKHIIllHlIll, we performed a trophic aggregation on all food 
webs, meaning that if two species had identical interactions, we combined them into one node. 
Further, all parasites were removed from the Ythan food web. 



Network 


Type 


N 




Self-edges 


^auto 


Source 


Leadership 


social 


32 


96 


3.000 


No 


0.053 


LU 


Prison 


social 


67 


182 


2.716 


No 


0.201 


LU 


WWW 


online 


325729 


1497135 


4.596 


Yes 


0.211 


LU 


Wikipedia 


online 


1598583 


19753078 


12.357 


Yes 


0.203 


L2J 


Pol. Blogs 


online 


1224 


19090 


15.597 


Yes 


0.377 


L3J 


WWW Model 1 


online 


325729 


1446887 


4.442 


Yes 


0.526 


L4J 


WWW Model 2 


online 


325729 


1448691 


4.448 


Yes 


0.565 


L4J 


WWW Model 3 


online 


325729 


1428052 


4.384 


Yes 


0.391 


L4J 


Coachella 


food web 


29 


262 


9.034 


Yes 


-0.361 


[5J 


Little Rock 


food web 


95 


1080 


11.368 


Yes 


-0.242 


[6J 


St. Marks 


food web 


48 


221 


4.604 


Yes 


-0.227 


L7J 


St. Martin 


food web 


42 


205 


4.881 


No 


-0.368 


L8J 


Ythan 


food web 


82 


395 


4.817 


Yes 


-0.055 


[9J 


Coachella Niche 


food web 


29 


259 


8.931 


Yes 


-0.408 


[lOJ 


Little Rock Niche 


food web 


95 


1056 


11.116 


Yes 


-0.284 


LIOJ 


St. Marks Niche 


food web 


48 


216 


4.500 


Yes 


-0.258 


LIOJ 


St. Martin Niche 


food web 


41 


208 


5.073 


No 


-0.398 


LIOJ 


Ythan Niche 


food web 


82 


386 


4.707 


Yes 


-0.389 


LIOJ 


Coachella Cascade 


food web 


29 


267 


9.207 


No 


-0.907 


ma 


Little Rock Cascade 


food web 


95 


1098 


11.558 


No 


-0.859 


LIOJ 


St. Marks Cascade 


food web 


48 


223 


4.646 


No 


-0.793 


LIOJ 


St. Martin Cascade 


food web 


42 


205 


4.881 


No 


-0.662 


LIOJ 


Ythan Cascade 


food web 


82 


384 


4.683 


No 


-0.702 


LIOJ 


Spanish 


word adj. 


11586 


45129 


3.895 


No 


0.913 


LU 


Japanese 


word adj. 


2704 


8300 


3.070 


No 


0.927 


LU 


French 


word adj. 


8325 


24295 


2.918 


No 


0.905 


LU 


English 


word adj. 


8525 


74921 


8.788 


Yes 


0.876 


lilljl 


Scrambled 


word adj. 


8525 


118161 


13.861 


Yes 


0.999 


m 


Bipartite 


word adj. 


746 


1290 


1.729 


No 


0.968 


LU 



23 



Supplementary Table 2: Directed assortativity results. For each network and each of the four 
possible pairs {a, /?) we show the Pearson correlation f{a, /?), the error in these quantities 
as estimated by jack-knife [12J, the average Pearson correlation of the random ensemble 
(^rand), the crror of this average al^"""^ (Methods), Z{a, (3), and ASP(a, /?). 



i>l CLWUl Jv 


(r\ (1) 
[a, p) 




rw 


(r j\ 

\' rand/ 


-.rand 




Zj \(JL, P) 


r\.\jL \Uc,pj 


Leadership 


{out, in) 


-0.157 


0.123 


-0.030 


0.0015 




-1.419 


-0.391 




{in, out) 


0.214 


0.107 


-0.015 


0.0014 




2.344 


0.646 
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-0.199 


0.010 


-0.036 


0.0013 




-1.844 


-0.508 
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-0.083 
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-0.045 


0.0013 




1.504 


0.415 


Prison 
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0.492 
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0.067 


-0.012 


0.0016 
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0.460 
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0.073 
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-0.053 
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0.0016 




-0.390 
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-6 
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-0.388 
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0.000 
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-5 
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_5 
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-0 
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Wikipedia 
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0.0028 
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-5 
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0.299 
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_5 
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-45.744 
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-5 


-0.609 
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-0.041 
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-2.285 


-0.086 
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-6.522 
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WWW Model 1 
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-7 


38.111 


0.380 
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6.5 X 10" 


-4 


5.401 


0.477 




{in, in) 


0.283 


0.045 


0.010 


9.3 X 10" 


-4 


5.495 


0.486 
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1.9 X 10" 


-5 


-29.772 
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-0 208 


2 8x10" 


-5 


-17 46R 


-0 '^72 
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-0 46Q 
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-0 255 


004 


-0 224 


S X 10" 


-5 


-2^ 062 


-0 4Q1 


French 


(out, in) 


-0.240 


0.002 


-0.210 


6.2 X 10" 


-6 


-75.777 


-0.599 




{in, out) 


-0.204 


0.002 


-0.183 


1.3 X 10" 


-5 


-49.451 


-0.391 




{out, out) 


-0.253 


0.002 


-0.220 


2.8 X 10" 


-5 


-65.006 


-0.514 




{in, in) 


-0.194 


0.002 


-0.174 


4.8 X 10" 


-6 


-59.801 


-0.473 


English 


{out, in) 


-0.226 


0.001 


-0.214 


3.3 X 10- 


-6 


-69.192 


-0.671 




{in, out) 


-0.203 


0.001 


-0.195 


5.7 X 10- 


-6 


-32.554 


-0.316 




{out, out) 


-0.193 


0.001 


-0.185 


9.7 X 10- 


-6 


-47.468 


-0.460 




{in, in) 


-0.238 


0.001 


-0.227 


3.9 X 10- 


-6 


-50.332 


-0.488 


Scrambled 


{out, in) 


-0.227 


0.001 


-0.235 


4.3 X 10" 


-6 


43.805 


0.496 




{in, out) 


-0.227 


0.001 


-0.235 


5.3 X 10" 


-6 


44.498 


0.504 




{out, out) 


-0.228 


0.001 


-0.235 


5.4 X 10- 


-6 


44.105 


0.499 




{in, in) 


-0.227 


0.001 


-0.234 


4.6 X 10- 


-6 


44.207 


0.501 


Bipartite 


{out, in) 


-0.974 


0.001 


-0.715 


4.7 X 10- 


-5 


-59.537 


-0.511 




{in, out) 


-0.973 


0.001 


-0.705 


9.6 X 10- 


-5 


-56.944 
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{out, out) 
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0.001 
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-5 
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-6 
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Supplementary Table 3: Standard deviations in food- web models. We show the standard 
deviations in r (a, (3) for 500 instances per real-world network of the cascade and niche model. 
Instances are constructed according to the procedure described in the Methods; note the large 
standard deviations of the niche model. 



Network 




^cascade 


^niche 




1 (11 if 171 \ 


0268 


1501 




{in, out) 


0.0235 


0.0826 




{out, out) 


0.0289 


0.1033 




{in, in) 


0.0262 


0.0739 


Little Rock 


{out, in) 


0.0178 


0.1314 




{in, out) 


0.0127 


0.0354 




{out, out) 


0.0173 


0.0777 




{in, in) 


0.0166 


0.0642 


St. Marks 


{out, in) 


0.0583 


0.1849 




{in, out) 


0.0455 


0.0729 




{out, out) 


0.0592 


0.1341 




{in, in) 


0.0592 


0.1046 


St. Martin 


{out, in) 


0.0575 


0.1841 




{in, out) 


0.0436 


0.0759 




{out, out) 


0.0603 


0.1276 




{in, in) 


0.0582 


0.1038 


Ythan 


{out, in) 


0.0486 


0.1636 




{in, out) 


0.0342 


0.0566 




{out, out) 


0.0463 


0.1116 




{in, in) 


0.0467 


0.0954 
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Supplementary Figure 1 : This figure shows the similarities between several real- world net- 
works in the ASP measure. Each pair of real- world networks is assigned a correlation by 
the dot product between their ASPs, Rij = ^a,(3 ASPi(a, /?) x ASPj(a, /?). This value ranges 
from —1 to 1, with 1 indicating highly correlated ASPs. Note that all three categories of net- 
works are clearly visible in the heat map, with some overlap between the online networks and 
the word- adjacency networks. In the next Supplementary Figure we identify the source of this 
overlap. 
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Supplementary Figure 2: This figure is constructed as in Supplementary Figure 1, but omits 
the ASF {out, in) from the dot product. The categories are much more clearly visible, which 
suggests that the additional measures discussed in this paper are of greater discriminatory power 
than the typical assortativity measure of lfT2ll . Note, however, that the political blogs are not 
grouped with the other online networks; this is consistent with their lacking the (m, out) Z- 
assortativity of the WWW and Wikipedia. 
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